NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Privacy Settings of Third-Party Libraries in Android Apps: A Study of Facebook SDKs

https://doi.org/10.56553/popets-2025-0056

Rodriguez, David; Calandrino, Joseph A; Del_Alamo, Jose M; Sadeh, Norman (April 2025, Proceedings on Privacy Enhancing Technologies)

Previous studies have demonstrated that privacy issues in mobile apps often stem from the integration of third-party libraries (TPLs). To shed light on factors that contribute to these issues, we investigate the privacy-related configuration choices available to and made by Android app developers who incorporate the Facebook Android SDK and Facebook Audience Network SDK in their apps. We compile these Facebook SDKs' privacy-related settings and their defaults. Employing a multi-method approach that integrates static and dynamic analysis, we analyze more than 6,000 popular apps to determine whether the apps incorporate Facebook SDKs and, if so, whether and how developers modify settings. Finally, we assess how these settings align with the privacy practices that developers disclose in the apps’ privacy labels and policies. We observe widespread inconsistencies between practices and disclosures in popular apps. These inconsistencies often stem from privacy settings, including a substantial number of cases in which apps retain default settings over alternatives that offer greater privacy. We observe fewer possible compliance issues in potentially child-directed apps, but issues persist even in these apps. We discuss remediation strategies that SDK and TPL providers could employ to help developers, particularly developers with fewer resources who rely heavily on SDKs. Our recommendations include aligning default privacy settings with data minimization principles and other conservative practices and making privacy-related SDK information both easier to find and harder to miss.
more » « less
Free, publicly-accessible full text available April 1, 2026
Large language models: a new approach for privacy policy analysis at scale

https://doi.org/10.1007/s00607-024-01331-9

Rodriguez, David; Yang, Ian; Del_Alamo, Jose M; Sadeh, Norman (December 2024, Computing)

Abstract The number and dynamic nature of web sites and mobile applications present regulators and app store operators with significant challenges when it comes to enforcing compliance with applicable privacy and data protection laws. Over the past several years, people have turned to Natural Language Processing (NLP) techniques to automate privacy compliance analysis (e.g., comparing statements in privacy policies with analysis of the code and behavior of mobile apps) and to answer people’s privacy questions. Traditionally, these NLP techniques have relied on labor-intensive and potentially error-prone manual annotation processes to build the corpora necessary to train them. This article explores and evaluates the use of Large Language Models (LLMs) as an alternative for effectively and efficiently identifying and categorizing a variety of data practice disclosures found in the text of privacy policies. Specifically, we report on the performance of ChatGPT and Llama 2, two particularly popular LLM-based tools. This includes engineering prompts and evaluating different configurations of these LLM techniques. Evaluation of the resulting techniques on well-known corpora of privacy policy annotations yields an F1 score exceeding 93%. This score is higher than scores reported earlier in the literature on these benchmarks. This performance is obtained at minimal marginal cost (excluding the cost required to train the foundational models themselves). These results, which are consistent with those reported in other domains, suggest that LLMs offer a particularly promising approach to automated privacy policy analysis at scale.
more » « less
Full Text Available
Data Retention Disclosures in the Google Play Store: Opacity Remains the Norm

https://doi.org/10.1109/EuroSPW61312.2024.00009

Rodríguez, David; Fernández-Aller, Celia; Del_Alamo, Jose M; Sadeh, Norman (July 2024, IEEE)

Full Text Available
Data Retention Period Disclosures in Privacy Policies

https://doi.org/10.17632/c4x958pzpm.2

Rodríguez, David; Fernández, Celia; del_Alamo, Jose M; Sadeh, Norman (January 2024, Mendeley Data)

115 privacy policies from the OPP-115 corpus have been re-annotated with the specific data retention periods disclosed, aligned with the GDPR requirements disclosed in Art. 13 (2)(a). Those retention periods have been categorized into the following 6 distinct cases: C0: No data retention period is indicated in the privacy policy/segment. C1: A specific data retention period is indicated (e.g., days, weeks, months...). C2: Indicate that the data will be stored indefinitely. C3: A criterion is determined during which a defined period during which the data will be stored can be understood (e.g., as long as the user has an active account). C4: It is indicated that personal data will be stored for an unspecified period, for fraud prevention, legal or security reasons. C5: It is indicated that personal data will be stored for an unspecified period, for purposes other than fraud prevention, legal, or security. Note: If the privacy policy or segment accounts for more than one case, the case with the highest value was annotated (e.g., if case C2 and case C4 apply, C4 is annotated). Then, the ground truth dataset served as validation for our proposed ChatGPT-based method, the results of which have also been included in this dataset. Columns description: - policy_id: ID of the policy in the OPP-115 dataset - policy_name: Domain of the privacy policy - policy_text: Privacy policy collected at the time of OPP-115 dataset creation - info_type_value: Type of personal data to which data retention refers - retention_period: Period of retention annotated by OPP-115 annotators - actual_case: Our annotated case ranging from C0-C5 - GPT_case: ChatGPT classification of the case identified in the segment - actual_Comply_GDPR: Boolean denoting True if they apparently comply with GDPR (cases C1-C5) or False if not (case C0) - GPT_Comply_GDPR: Boolean denoting True if they apparently comply with GDPR (cases C1-C5) or False if not (case C0) - paragraphs_retention_period: List containing the paragraphs annotated as Data Retention by OPP-115 annotators and our red text describing the relevant information used for our annotation decision
more » « less
Sharing is Not Always Caring: Delving Into Personal Data Transfer Compliance in Android Apps

https://doi.org/10.1109/ACCESS.2024.3349425

Rodriguez, David; Del_Alamo, Jose M; Fernández-Aller, Celia; Sadeh, Norman (January 2024, IEEE Access)

In an era marked by ubiquitous reliance on mobile applications for nearly every need, the opacity of apps’ behavior poses significant threats to their users’ privacy. Although major data protection regulations require apps to disclose their data practices transparently, previous studies have pointed out difficulties in doing so. To further delve into this issue, this article describes an automated method to capture data-sharing practices in Android apps and assess their proper disclosure according to the EU General Data Protection Regulation. We applied the method to 9,000 random Android apps, unveiling an uncomfortable reality: over 80% of Android applications that transfer personal data off device potentially fail to meet GDPR transparency requirements. We further investigate the role of third-party libraries, shedding light on the source of this problem and pointing towards measures to address it.
more » « less
Full Text Available

Search for: All records